Evaluating bacterial gene-finding HMM structures as probabilistic logic programs
نویسندگان
چکیده
MOTIVATION Probabilistic logic programming offers a powerful way to describe and evaluate structured statistical models. To investigate the practicality of probabilistic logic programming for structure learning in bioinformatics, we undertook a simplified bacterial gene-finding benchmark in PRISM, a probabilistic dialect of Prolog. RESULTS We evaluate Hidden Markov Model structures for bacterial protein-coding gene potential, including a simple null model structure, three structures based on existing bacterial gene finders and two novel model structures. We test standard versions as well as ADPH length modeling and three-state versions of the five model structures. The models are all represented as probabilistic logic programs and evaluated using the PRISM machine learning system in terms of statistical information criteria and gene-finding prediction accuracy, in two bacterial genomes. Neither of our implementations of the two currently most used model structures are best performing in terms of statistical information criteria or prediction performances, suggesting that better-fitting models might be achievable. AVAILABILITY The source code of all PRISM models, data and additional scripts are freely available for download at: http://github.com/somork/codonhmm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
Biological Sequence Analysis using Probabilistic Logic Generalizations of Hidden Markov Models PhD Thesis
Motivation: Probabilistic logic programming offers a powerful way to describe and evaluate structured statistical models. To investigate the practicality of probabilistic logic programming for structure learning in bioinformatics, we undertook a simplified bacterial gene-finding benchmark in PRISM, a probabilistic dialect of Prolog. Results: We evaluate Hidden Markov Model structures for bacter...
متن کاملA Probabilistic Genome-Wide Gene Reading Frame Sequence Model
We introduce a new type of probabilistic sequence model, that model the sequential composition of reading frames of genes in a genome. Our approach extends gene finders with a model of the sequential composition of genes at the genome-level – effectively producing a sequential genome annotation as output. The model can be used to obtain the most probable genome annotation based on a combination...
متن کاملA novel bacterial gene-finding system with improved accuracy in locating start codons.
Although a number of bacterial gene-finding programs have been developed, there is still room for improvement especially in the area of correctly detecting translation start sites. We developed a novel bacterial gene-finding program named GeneHacker Plus. Like many others, it is based on a hidden Markov model (HMM) with duration. However, it is a 'local' model in the sense that the model starts...
متن کاملModeling and predicting transcriptional units of Escherichia coli genes using hidden Markov models.
MOTIVATION The hidden Markov model (HMM) is a valuable technique for gene-finding, especially because its flexibility enables the inclusion of various sequence features. Recent programs for bacterial gene-finding include the information of ribosomal binding site (RBS) to improve the recognition accuracy of the start codon, using this feature. We report here our attempt to extend the model into ...
متن کاملProbabilistic Logic Methods and Some Applications to Biology and Medicine
For the computational analysis of biological problems-analyzing data, inferring networks and complex models, and estimating model parameters-it is common to use a range of methods based on probabilistic logic constructions, sometimes collectively called machine learning methods. Probabilistic modeling methods such as Bayesian Networks (BN) fall into this class, as do Hierarchical Bayesian Netwo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 28 5 شماره
صفحات -
تاریخ انتشار 2012